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^ ■ 1 Introduction 

O 

fSl ' When modelling extremes of environmental phenomena often we wish to understand their 

$H ■ behaviour over a region, in particular dependence between the extremes at different loca- 

tions. This is typically hindered by two things: first that extreme events are by definition 
rare, and second a lack of locations where data have been gathered. Many situations exist 
in which understanding dependence between extremes is important, especially for environ- 
mental phenomena. For example here interest lies in estimating extreme rainfall. If areal 
estimates can be produced then rainfall amounts accumulated over a river's catchment 
W ' could be understood and in turn this could lead to estimates of susceptibility to flood- 

ing which are vital for the insurance industry. Ship building is another area in which an 
understanding of dependence between extremes is important because the level of punish- 
ment experienced by a ship on a given journey will be affected by the level of dependence 
! ■ between sea waves at different locations. 

The aim here is to produce estimates of extreme rainfall for a large region of the UK 
^ I where conventional time series data exist but only from rain gauges at a small number 

cn ' of locations. To overcome the spatial sparsity of the data, they will be supplemented 

^' . with simulator output — from a regional climate model for example — in order to benefit 

^vq I from the simulator's richer spatial provision, which can typically be specified. As a result 

_^ ■ we hope to improve estimates of extreme rainfall over the region under study. While 

^^ . we focus on estimating extreme rainfall, there are many different simulators for many 

^N I different phenomena, and the robust approach that we take can be extended to many 

other applications. For example, we consider only conventional data from rain gauges, 
though in the ship building example wave height data may come from buoys, oil-rig- 
^ I mounted equipment or even satellites, all of which may be spatially sparse, but all may be 

supplemented with simulator output to bring more accurate spatial estimation of extreme 
wave heights. 

The remainder of this paper is as follows. In ^ we outline univariate results for 
modelling extremes, introduce extensions to the methodology to incorporate spatial de- 
pendence and conclude by showing how model parameters may be estimated. In ^ we 
establish a link with and describe previous approaches to downscaling extremes, introduce 
notation, outline our proposed method for spatial interpolation of extremes and finally 
extend the spatial model for extremes to incorporate this. In §l]we describe a variety of 
checks to assess the fit of the model. Then in ^we analyse extreme rainfall for a central 
region of the UK using the model. Finally in f|6] we summarise the work presented. 
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2 Spatial modelling of extremes 

2.1 Univariate background 

This section primarily describes the underlying class of spatial extremal models that will be 
used in subsequent modelling of extreme rainfall, beginning with the original asymptotic 
extremal theory on which the model is based. Consider a strictly stationary sequence 
{Zi}, i = 1, . . . ,n, and define M„ = maxj=i^...^„ Zi. If constants a„ > 0, bn exist such that 
as n ^ oo then 

pT[a-\Mn-bn)<z]^G{z) (l) 

where G is a nondegenerate distribution function then G is the generalised extreme value 
(GEV) distribution 
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defined when ip > 0, ior {z : 1 + ^{z — iJ.)/ip > 0} and where the case ^ = results from 
the hmit ^ — > 0. 

Relying on the asymptotic results of equations ([1]) and ([2]) and by assuming equation 
([1]) to be approximately true for sufficiently large n, a statistical model may then be formed 
for a sequence of data zi,Z2,- ■ ■ by dividing it into plausibly homogeneous blocks all of 
size n and then assuming that the resulting block maxima follow a GEV distribution. 
Quantiles of the GEV distribution have a more natural interpretation than its parameters 
themselves, and are commonly reported from an extremal analysis. Specifically if qp 
satisfies G{qp) = 1 — 1/p then it is referred to as the p-year return level. For a stationary 
sequence it may be regarded as the level above which only one exceedance is expected in 
p years. Based on equation ([2]) qp is given by 

{fi-—{l-yp^) when ^ 7^ 0, 
/i-V^log(yp) when ^ = 0, 

where Vp = — log(l — p). 



2.2 Spatial framework 

The GEV model is now extended to a spatial context, with particular emphasis placed on 
modelling environmental phenomena. Assume that at each point s in some region R C IR^ 
time series data for some process exist which are divided into blocks resulting in block 
maxima Xt{s), t = 1,2, .. .. To ensure approximately similar behaviour within blocks and 
equal block sizes, a common choice for environmental data is to use annual maxima. Here 
when modelling extreme rainfall we will consider annual maxima of daily rainfall data, 
more details of which will emerge in the later application. At each location s assume that 



[Xt{s)\fiis), iPis), e(s)] is GEV{^,{s), ^(s), e(s)) 



(3) 



where [•] denotes "distribution of". The spatial model will adopt a hierarchical structure 
and relation ([3|) will be referred to as its data layer. For environmental data it is often 
the case that the dependence between Xt{s) and Xt{s'), for locations s,s' £ R, relates to 
their relative locations in space, or more simply to their distance apart. We capture this 



through the GEV parameters by letting dependence exist between (^fj,{s),ip{s),^{s)^ and 

(/i(s'), ^(s'),^(s')) and decay as a function of distance. Furthermore, all spatial depen- 
dence is assumed to be characterised through the GEV parameters so that consequently 
Xt{s) and Xt{s') are conditionally independent given their respective GEV parameters, 

for all pairs s, s' € R, which we shall r efer t o as the conditional independe nce assumption. 

As f irst used bvlCasson and Colea (119991) . and in subse quent variants bv lFawcett and Walshaw 
( 20061 ) . ICoolev et al.l ( 20071 ) and lSang and Gelfandl ( 20091 ). for example, we use a Gaussian 
process (GP) to characterise dependence between GEV parameters. For the present appli- 
cation the GP offers many benefits: the ability to be used for high-dimensional problems, 
ie. for data at many locations; ease of spatial interpolation using conditional Gaussian 
arguments; and the plausibility of the joint and marginal assumptions about variability 
induced on GEV parameters. First consider a GP assumption for the GEV location pa- 
rameter /i(s). This forms one spatial process layer of the hierarchical model in which 

[f^{s)]isGP{m{s),a^c{, )), (4) 

for mean function m{s), underlying variability o"^ and correlation structure c( , ). Allowing 
m(s) to depend on s lets covariate effects be introduced, which is particularly attractive for 
environmental data. Then the belief of a decay in dependence with distance is incorporated 
through the correlation structure. The exponential structure offers decay in a simple and 
intuitive way, but here a slightly more relaxed modelling assumption is preferred and so 
we choose the powered exponential structure. 



c{s, 



l + T^a^ 
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(5) 



where < 6 < 2, < a, r a nd < i;^; only if r > is the GP discontinuous everywhere. 



Gneiting and GuttorpI (J20ld ) offer further choices of correlation structure. GPs may be as- 
sumed for ip{s) and ^{s) similarly, though it is more natural to work with p{s) = log{'0(s)}, 
to ensure the parameter's positivity. 



2.3 Model estimation 

To estimate model parameters for th e present problem we us e an adaptation of the Monte 
Carlo EM algorithm, introduced by IWei and Tanneii (Il990l): more spe c ific de tails of the 
algorithm related to the present problem can be found in iMcCullochI ( 19971 ). Here the 
method of parameter estimation is found to have many benefits, including not being 
unduly sensitive to starting values, converging reasonably quickly, depending on the level 
of accuracy sought, and avoiding prior specifications on parameters, such as those in 
c( , ), to which final parameter estimates can be sensitive. We outline the algorithm by 
considering the simplified case in which [Xt{s) \ fJ-{s)] is GEV(^iJ,{s), V; ^) to ensure that 
estimation of parameters in both the data and spatial process layers is illustrated, which 
would not be possible if GPs were assumed for all parameters. Furthermore we adopt 
such a specification in the extreme rainfall application of [|5j The estimation procedure, 
however, extends readily to alternative formulations in which different combinations of 
GEV parameters are assumed to follow GPs. 

Let 02 denote parameters characterising the GP distribution of /-f(s), so that the full 
parameter set is ^ = {6i, 62) where 9i = (-0, 0- The GEV[fi{s),tl^, ^) density will be 
denoted /i( | fJ.{s), 9i) and the GP density relating to fi{s) denoted /2( | ^2)- To achieve a 
maximum likelihood estimate of 9, 9 say, based on a finite set of locations S = {si, . . . , sd}, 



we wish to maximise 



T D 



ds. 



(6) 



\T 



where fi{s) = [fi{si), . . . ,fi{s£)))^ . The integral of equation ([6]) is D-dimensional, which 
can either significantly hinder or even prohibit the finding of its analytical solution, in 
particular in spatial applications where D may be large. In the standard EM approach 
to parameter estimation the random /i(s) is treated as missing data giving complete data 
z = (x, /x(s)) where xt{s) = {xt{si), . . . ,xt{sD)) and x = (xi(s), . . . ,xt(s)). Then, 
taking logarithms of the likelihood in equation ([6]) , we require parameters that maximise 
the expected log likelihood 



T D 



X 



(7) 



t=i 



However, the expected log-likelihood of equation ([7]) is again typically complex, beyond 
the finding of an analytical solution to its maximum. Draws from [/^(s) | x] can however be 
obtained using a Metropolis- within-Gibbs sampling procedure, and consequently a Monte 
Carlo estimate of the e xpectation i i i equ ation ([7]) can be achieved; efficient choice of 
proposals is discussed in iMcCullochI (j 19971 ). Let /^i(s), i = 1,. .. ,N, denote draws from 
[/x(s) I x]. Then for the Monte Carlo EM algorithm we require parameters that maximise 



NT D 

]^E{E([Ei°g{/iM^^)i/^^(^)'^i)}]+i°g{/2(/^.(^)i^2)})}. 
i=i t=i j=i 



(8) 



Recognising that the left- and right-hand sides of the sum in equation ([8]) depend only on 
parameters 9i and 62 respectively, the sum may be divided into two sums accordingly and 
parameter estimates reached by maximising each sum separately. 



2.4 Uncertainty estimation 

The conditional independence assumption of ^2.21 implies that, given GEV parameters, 
annual maxima at different locations will be independent and have variance equal to their 
corresponding GEV distributions. For the present rainfall application, we can imagine that 
almost identical rainfall levels will be experienced at locations sufficiently close together, 
that is where we expect variability to be less than assumed GEVs. While this model 
misspecification will not affect parameter estimates, the Fisher information associated 
with the MCEM likelihood can no longer be used to give reliable estimates of parameter 
uncertainty. Consequently w e modify the sand wich informa tion correction, originating 
from works by iHuberl (jl967l ) , lEickeiJ (jl967l ) and I White! (jl982l ) , so that it is applicable to 
a MCEM likehhood. 

We illustrate this modification to the sandwich information correction by considering 
only the data layer of the model, ie. for the parameters 9i, primarily based on the above 
example of potential model misspecification; however, extending this procedure to the 
process layer requires simple alteration. Because not all GEV parameters may be assumed 
to follow GPs, the case in which [Xt{s) \ /u(s)] is GEV[fj,{s), ip, ^) is again considered. Let 



e{9i; xt{sj), fii{s)) = log {fi{xt{sj)\^i{sj), 6*1)} 



and, with 6i = (0i,i, . . . , Oi^ne ), let 

j{9i ; xt{sj), fii{sj)) = Vi{0i ; xt{sj), fj.i{s)) 
with A;th element 

jk{0i ; xt{sj), /Ui(sj)) = — — £{01 ; xt{sj), /Xi(sj)) 
k = 1, . . . , ng-^ . Then write 

N D T 

J{^i) = J^^^^3{^^'^ xt{sj), fJ-iisj)) {j{Oi; xtisj), fJ,i{sj))y 
1=1 j=i t=i 

Let H{6i) have (/,?Ti)th element 



h(i^m){Gi) 



d 



dOi^idOi,, 



N T D 



]v 5ZZ]Z]^(^i ' ^*(^i)' /^^(^j)) 



i=l t=l j=l 

This leads to the final estimate of the covariance matrix for 9i 

01=6*1 

where 9i is the estimate of 9i that maximises the MCEM likelihood of equation 



3 Spatial interpolation using computer simulator output 

3.1 Background 

In this section we introduce a method for spatial interpolation of extremes based on supple- 
menting field data — eg. resulting from a measurement or observation — with output from 
a numerical model, or computer simulator as we shall refer to it, such as a regional climate 
model (RCM). Our motivation is the desire to produce predictions of extremes over an 
entire region that capture spatial dependence where field data are spatially sparse; conse- 
quently simulator output is also used in order to benefit from its high spatial resolution. 
The predictions produced will be representative of point level, in theory allowing contin- 
uous maps for entire regions to be produced. In practice maps representing discretised 
regions at arbitrarily fine scales will be produced. Our motivation for spatial interpolation 
shares similarities with statistical downscaling, in which large-scale data are downscaled 
so that inferences about finer scales can be made. Due to this similarity we review a se - 
lecti on of its correspondi ng literature. The reader is referred to IWilbv and Wiglevl (jl997l ) 
and lMaraun et al.l (2010|) for more comprehensive reviews. 

The most developed statistical downscaling methods use stochastic weather genera- 
tors or tr ansfer functions. Stochastic w eather generators originate from the wet-dry day 
models of iGabriel and Neumann! (jl962l ) in whi ch transitions between we t and dry days 
have Markov structure. An extension of this bv lKatz and Parlangd ( 19961 ) is to assume a 
mixture distribution for the rainfall amount on a wet day, the parameters of which vary 
according to output from a large-scale mo del. More complex stochastic weather genera- 
tors have also been proposed. For example, iKilsbv et al.l ( 20071 ) condition a rainfall model 
and weather generator on a wet-dry day model, deriving parameters for the model from 



past and future global climate model runs, thus allowing statistics from the rainfall and 
weather generators to vary between climate scenarios. 

A variety of methods have been developed to account for differences between aggre- 
gated ar id point-level extrenies. W ith the goal of understand ing future fine-scale extreme 
rainfall, iHuntingford et alJ ( 20031 ) and lKallache et alJ ( 20 111 ) use similar approaches that 



establish relationships between extremes of past and future epochs through GEVs fitted 
to annual maxima of rainfall accumulations generated by ROMs. GEVs fitted to annual 
maxima of past station data are then transformed accordingly to gi ve quantile-based esti- 



mates of future point-level extreme rainfall. A similar approach by iFriederichs and Hense 



( 20071 ) uses quantile regression to relate quantiles of the distribution of rainfall accumula- 



tion at a given weather station, condition al on it having rained, to outpu t frorn . a spatially 
aggregated rainfall model. Alternatively, iMannshardt-Shamseldin et al.l ( 2010l ) develop a 



regression relationship between return levels estimated from both large-scale and point- 
level rainfall data and use this relationship to adjust large-scale return levels to represent 
point level. By using RCM data for future epochs this approach can also be used to give 
predictions of future point-level rainfall return levels. 

3.2 Data and notation 

For the remainder of this section the following notation will be used: Xp^tis) and XM,t{s) 
respectively denote annual maxima of field data and aggregated simulator output for an ar- 
bitrary location s G R and time t, t = 1, . . . ,T. The field data will be assumed to represent 
point-level in which interest here lies without bias. Being the result of aggregation, such 
an assumption of unbiasedness cannot be made for the simulator output; consequently we 
propose to convert the simulator output using a smooth function, denoted (?(), that will 
correct for scale difference between the data. In general the optimal form for gO may be 
unknown and in which case non- or semi-parametric forms may be useful, or parametric 
forms deemed not to impose unwelcome constraints. 

3.3 Spatial interpolation model formulation 

The model to be used for spatial interpolation is based on the hierarchical spatial model 
introduced in ^2.21 and is outlined having assumed that a form for g{) has been chosen, 
which for the application to extreme rainfall is discussed in ^ For the data layer and 
given GEV location, scale and shape parameters, n{s), ipi^) and ^{s) respectively, the 
GEV in which interest lies is assumed to be shared by annual maxima of the field data, 
so that 

[XfAs)] is GEV{fiF{s),i'F{s),CF{s)) 

for all s G i? and t = 1, . . . , T. Once transformed by g{) a related GEV is then assumed 
for annual maxima of the simulator output: 

[g{XM,ti^))] is GEV{f.iMis), iPm{s), ^m{s)). 

The preceding specification therefore allows the two different sources of data, quantifying 
the same phenomenon but on different scales, to be modelled jointly. Part of our motiva- 



tion for this joint modelling comes from [Anderson and TurkmanI (jl99ll ) in which results for 



the joint distribution of maxima and sums of sequences are derived by combining results 
from extreme value theory and the central limit theorem. 

The joint specification is completed by the spatial process layer. For this a GP is 
assumed which, considering the GEVs location parameter for illustration, may be given 



by 

[^.(s)] is GP{m.{s),alci,)) 

where a?,^ is a variance parameter, c( , ) represents a correlation structure and where . may 
be replaced with F or M to represent the separate, respective specifications of the field 
data and simulator output. Allowing different GP specifications between the two data 
types consequently allows their differences in scale to be absorbed not only by g{) but also 
by the GP. Similar GPs may also be assumed for p_{s) = log (V'.('S)) and ^{s). 

4 Model checking 

We consider a variety of methods for checking the fit of the latent Gaussian extreme value 
model described in ^2.21 and 



4.1 Quantile plots 

First we asses fit of the proposed model by considering the conditional GEV assumption, 
given in equation ([3]), using a modification of the quantile plot. The formulation of ^2.31 
in which ip{s) = ip and ^(s) = ^, is again used for illustration, though alterations for when 
GPs are assumed for other combinations of GEV parameters follow naturally. Modification 
of a standard quantile plot is required due to the GEV's location parameter being random. 

Suppose that at location s we have observed annual maxima xi{s), . . . , Xn{s), with or- 
dered counterparts x^^'{s) < . . . < x^"'>{s), that are assumed to follow a GEV[fi{s), ip, ^) 
distribution. (These should initially be thought of as annual maxima of field data; quantile 
plots for the simulator output can be achieved by replacing Xj(s) with g{xi{s)) through- 
out.) 

Recall from i j2.3l that fJ-i{s), i = 1, . . . N, draws from [/i(s) \xi, . . . , Xn], can be obtained, 
and then combined with estimates ^ and ^ of ^ and ^ so that a collection of GEV distri- 
butions, (?( ; fii{s)) and corresponding inverse functions G~^[ ;/ij(s)), i = 1, . . . ,N, that 
reflect the randomness of ij,{s) can be specified. A quantile plot appropriate for the present 
MCEM setting may then be formed by plotting the pairs 






; Mi(s)) , k = l,...,n. 



n+1 



Deviation from linearity of the pairs indicates model failure. The level of devia- 
tion expected may be estimated through Monte Carlo simulation, repeatedly sampling 
from G'(;^j(s)). Take Nq samples from G[;fii{s)^ and denote the ordered samples by 

x\ (s) < . . . < X-" (s), i = 1, . . . ,Ng', then take the /th order statistic from each sample, 
ie. x-[ (s), . . . ,x\^ (s), and denote their ordered counterparts by XrMs) < . . . < xL*^ (s), 
I = 1, . . . ,n. Finally, approximate 100(1 — a)% confidence bounds for the quantile plot 
at x^'^'{s) are given by (ijk "^ (■?)) Xn^\ '^ " (-s)) where [-J denotes "integer part". 
The accuracy of these confidence intervals can be improved by also accounting for uncer- 
tainty in the estimates ip and ^, and also of ^O when using the simulator output; however 
this modification tends to bring little change to the confidence bounds achieved. 

4.2 Spatial structure diagnostics 

This diagnostic is designed to assess the adequacy of the estimated spatial structure of the 
proposed model by considering how well it compares with empirical estimates of spatial 



dependence. Again we assume that [Xt{s) | /i(s)] is GEV[^{s), ip, ^). However, unlike 
the other model checks, this check does not extend readily to the case in which either or 
both of il^{s) and ^(s) are random, but is sufficient here given the formulation that we 
adopt when modelling extreme rainfall in ^ When [Xt(s) \ /i(s)] is GEV[fi{s), Tp, ^) we 
can write 

Xt{s) =fi{s)+e{s) 



where 



and fi{s) is as in relation (H]). Let var[e(s 
that 



[e{s)] is GEV{0, ^P, e) 

: cr^(s). Then for arbitrary s, s' € -R we have 



cov 



'Xt{s), Xt{s')) = coY{fi{s) + e{s), /i(s') + e{s')) 

= cov(/i(s), /i(s')) +cov(e(s), e(s')) +cov(;u(s), e{s')) +cov(/i(s'), e(s))- 

The conditional independence of e(s) and e(s') given /i(s) and /i(s') and independence 
between e() and /_f() gives cov(Xi(s), Xt{s')) = af^c{s,s') so that 



corr 



[Ms), Xtis')) 



c{s,s') 



1 + 



c^lis) 



erf, 



1 + 



a^s') 



erf. 



(9) 



Thus a plot of empirical estimates of corr(Xi(s), Xt(s')) against those expected under the 
model, given in equation ([9|), provides a method of assessing the model's spatial structure. 
Combinations of both field data and simulator output can be assessed by transforming 
annual maxima by g{) where appropriate. Note that for the GEV if .^ < 0.5 then 0"^(s) is 
finite, given by '0Hr(l - 2^ - r2(l - ^)}/^2 if ^ ^ q and by t/j'^Tr'^/6 if ^ = 0. 

In the case where '^(s) or ^(s) or both are random, the above procedure cannot be 
easily modified to provide a similar method of assessing any estimated covariance structure. 
However simulations from the model may instead be used to provide model-based estimates 
of corr(Xt(s), Xt(s')) which may be compared with empirical estimates. 



4.3 Crossvalidation 

A final way in which the fit of the model can be assessed is through crossvalidation, using 
kriging to predict annual maxima at locations with data though omitted during model 
estimation. While well documented in the literature, the procedure used is outlined again 
here as it will be relied on later for interpolation. Let s* denote a location for which a 
prediction is required and suppose that fJ-i{s), i = 1, . . . ,N, have been simulated from 
[fJ.{s) I x]; then we wish to simulate from [^{s*) \ /i(s) = Hi{s), x]. This is possible through 
properties of the Gaussian process as 



H{s) 



^^^^'Cd*))'^* 



where 



Then 



S* 



a 



2c(,) crlc{,s*) 



af,c{,s*y a 



S. 






[li{s*) I ^l{s) = ^Ji,{s)] is N{^iy,{s*), aUs*)) 



where 

//|^(s*) = m(s*) - Ss*,^S7^(/Xi(s) - m(s)) 

and 



0"i 



is*) = al-^s*,s^-'^s 



If GPs are assumed for V'!*) of ^(s), kriging may also be used to simulate from their 
respective conditional distributions; if not the MCEM estimates may be used. The result 
is that a complete set of GEV parameters may be found for s* and consequently quantile 
plots as described in ^4.11 may be used to assess whether model predictions are consistent 
with the data not used in model estimation. To account for uncertainty in the kriging 
estimate due to uncertainty in the parameter estimates on which it depends, simulations 
from the joint distribution of parameters can be obtained and then kriging estimates 
produced for each simulation. A potentially more useful application of this kriging-based 
procedure is the production of return level maps, which will be introduced in the context 
of extreme rainfall prediction in ^5.51 

5 Extreme Rainfall 

We now perform spatial interpolation of extreme rainfall using the model introduced in ^ 
Attention is restricted to a region that is primarily the South and Midlands of England, 
indicated in Figure [Tal choosing not to study the entire UK to aid proof of concept of the 
model. For example, this avoids some of the many coastline effects of extreme rainfall. 
Extensions to the present analysis, that would help in analysis of the entire UK, are 
discussed further in 

5.1 The data 

To estimate model parameters we use both field data and computer simulator output. 
The field data are annual maxima of daily rainfall accumulations from rain gauges at 15 
sites and are obtained from the UK Meteo rological Office's MIDAS Land Surface Stations 



database (JUK Meteorological Office!. 120061) . The computer simulator outpu t is the 0.5° E- 



OBS gridded dataset ( Havlock et al.l . l2008l : Ivan den Besselaar et al.l . l201ll ). also available 



as daily data. The locations at which data are used, identified by type, are shown in 
Figure llbl Rainfall accumulations from 1st January 1950 to 31st December 2009 are 
studied. Some years' field data are incomplete, in which case, provided these are believed 
to be missing at random, annual maxima are omitted from analysis if five or more days' 
measurements are missing. To give an idea of any systematic differences between the 
data sources. Figure [2] shows plots of field data against most proximate simulator output 
(defined by distance from rain gauge to nearest grid cell centre) for four locations that are 
labelled on Figure [Tbl 



5.2 Rainfall model specification 

Particularly important in the model specification is the choice of 5f(), which here we choose 
first. While the optimal form of the downscaling function is likely to be complex due to 
the complexity of the computer simulator, a flexible class of model arises from the choice 
g{x) = X, thus absorbing all differences between the different data sources through the 
GEV parameters and GPs. 

A variety of model specifications based on ^3.31 are explored, beginning with assuming 
GPs for all three GEV parameters, for each of which a variety of mean structures, based 
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(a) Study region 



(b) Data locations by type 



Figure 1: Region of UK studied (left panel) and locations of data used in model estimation, 
identified by type: (•) computer simulator output (grid cell mid-point), (•) rain gauge 
location. (Numbers within symbols identify sites 1, 2, 3, and 4 which are referred to 
later.) 



on covariates known to influence extreme rainfall, are considered. Initially covariates that 
may benefit the mean structure are assessed through marginal GEV parameter estimates, 
that is based on fitting GEVs independently to annual maxima at each location. For each 
of the GEVs three parameters, plots of parameter estimates against elevation, longitude 
and latitude are shown in Figure [3l When considering elevation as a covariate we note 
that its definition differs between the field data and simulator output: for the former it 
is simply the height above sea level of the rain gauge, whereas for the latter it represents 
elevation aggregated over the cell corresponding to the output. These differing definitions 
suggest using a separate trend in elevation for each data source, which is accommodated 
through the GP mean structure. Separate trends will also be explored for the longitude 
and latitude covariates because extreme rainfall quantified by the different data sources 
could react differently to changes in longitude or latitude, but not because of differing 
definitions. 

Many logical functional forms to capture relationships between the GEV parameters 
and covariates are studied. These are initially assessed through regression on the marginal 
parameter estimates, and later through effects of choice of GP mean structures on the 
MCEM likelihood, specifically the size of the likelihood relative to the number of model 
parameters. Irrespective of the mean structures for the GEV scale and shape parameters, 
or whether one or both of the parameters have GP form, their corresponding GP variance 
estimates are negligibly small. Consequently a GP structure is only adopted for the 
GEVs location parameter. Models in which ^ is constant, but differs between the data 
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Simulator output 

(a) Site 1, distance 10.8km 
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(b) Site 2, distance 15.7km 




(c) Site 3, distance 16.8km 
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Figure 2: Annual maxima of daily rainfall accumulations (mm) for field data against 
nearest simulator output. Distance represents that from the location of the rain gauge to 
the centre of the simulator's corresponding cell. The line of no bias ( ) is superimposed. 
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Figure 3: Spatially-independent GEV parameter estimates against elevation, longitude 
and latitude, identified by type: (•) computer simulator output, (•) observational data. 
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sources, are found to be most parsimonious. Thus any covariate effects are absorbed by 
the GEV's location and scale parameters. Both parameters are found to depend heavily on 
elevation, for which different linear trends are assumed between parameters and between 
data sources. Finally we find the GEV's location parameter to also vary with latitude 
and longitude, and incorporate this in the model through linear trends that differ between 
data sources. 

Using the . notation as in ^3.3|, the final model used is given by 



where 



[X.,tis)\fi.is)] is GEV{f,Xs),^P.is),t) 



[fi.is)] is GP{mXs),alci,)), 



with (T^ a variance parameter, c( , ) represents the powered exponential structure described 
in equation dS} and where 

7n.(s) = fi,fi + /i.^i X elevation{s) 

+ f^.,2 X latitude{s) + /i.^3 x longitude{s) 

and 

V'.(s) = exp{V'.,o + V'.,i X elevation{s)}. 

If the preceding analysis was to be extended to modelling extreme rainfall for the entire 
UK, one of the most significant changes that might benefit the above model would be to 
consider proximity of locations to the coast, and consequently to also possibly account for 
the direction of prevailing winds, and to incorporate these through further covariates. 

5.3 Model estimates 

All of the parameters estimated were introduced in ^5.2i Estimates of Vf,i and ^l^M,l 
from the data layer of the model, and of hf,o and hm,o from the spatial process layer, 
are shown for each iteration of the MCEM algorithm in Figure [H convergence appears 
convincing and as a result the MCEM method of parameter estimation is deemed to work 
well. Note that altogether we have data for D = 60 sites and perform 100 iterations. 
Initially for the MCEM algorithm we choose N = lOD and increase this by 10% at each 
iteration. By gradually increasing A'^ to its final value the speed of convergence is improved 
because an approximate estimate is reached quickly and is then made more accurate by 
the increase in N. This procedure also helps avoid finding only local as opposed to global 
maxima. Alternative initial parameter values were also tested, though all led to the same 
final estimates. Table [1] shows estimates for all parameters based on iteration 100. The 
accompanying standard error estimates for the data layer are achieved using the variant 
of the sandwich estimator introduced in ^2.4|, whereas those for the spatial process layer 
are based on the usual observed Fisher information. 

5.4 Model checks 

Initially the fit of the model is assessed using quantile plots, outlined in ^4.11 These are 
shown in Figure [5j The plots for almost all sites do not give reason to doubt the estimated 
model, as the points deviate little from linearity. For site 3, for example, this deviation is 
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Figure 4: Parameter estimates for -0.^0 and /u.^o at each of the 100 iterations of the MCEM 
algorithm. Initially N = WD. 

larger than for the other sites, and its form suggests that the annual maxima at that site 
may be consistent with a GEV with a lighter tail. However, as this deviation is within the 
confidence bounds given, and because in general the field data appear consistent with the 
estimated spatial model, the present check does not give cause for concern. Furthermore, 
while not shown in the present paper, related quantile plots for the simulator output, using 
the method mentioned in N4.ll are equally supportive of the estimated model. 

We proceed by using the method outlined in N4.2I to assess the fit of the estimated 
spatial structure. Upon simple inspection there are signs that conditional on the random 
GEV location parameters, the remaining variability in annual maxima is notably less than 
that of the assumed GEV distribution. Consequently we modify the estimate of equation 
^ so that we simply assume that cr'^ X^) = kip'^{s){T{l — 2^.) — r^(l — ^.)}/C^, noting 
that < ^. lb 2 X S.E.(^.) < 0.5, thus assuming that the residual variability is proportional 
to that expected under the model. Therefore, considering the correlation between annual 
maxima of the field data and simulator output for example. 



C01Cv[XF,t{s),XM,t{s')) 



{s, s') 



\ 




(10) 



for i = 1, . . . ,T and s, s' G R, noting that g{x) = x. Figure[6]shows a plot of corr(X^t(s),X^t(s')) 
against the estimate in the RHS of equation (|10|) considering all combinations of field data 
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Figure 5: GEV quantile plots of annual maxima from the field data against those expected 
under the model together with 95% confidence bounds for sites identified in Figure lib! 
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Parameter 


Estimate 


S.E. 


V'F.O 


1.96 


0.0659 


fJ'Ffi 


41.8 


14.7 


V'F,! 


0.000782 


0.000610 


A*F,1 


0.0342 


0.00206 


V'M.O 


1.76 


0.0180 


AiF,2 


-0.371 


0.228 


V'A/,1 


0.000986 


0.000191 


fJ'F,3 


-0.276 


0.283 


^F 


0.101 


0.0642 


fJ-Mfi 


33.0 


10.3 


Cm 


0.050 


0.00766 


/^Af,l 


0.0223 


0.00201 








/^Af,2 


-0.162 


0.167 








/^A//,3 


-0.205 


0.197 








CTf. 


0.0121mm 


0.131 








</> 


3.84km 


0.845 








6 


-0.643 


0.271 








T 


0.050km 


N/A 



Table 1: Parameter estimates for the model described in 
consequently has no S.E. estimate. 



Note that r is fixed, and 



and simulator output locations. Correlation estimates are binned based on the model- 
based estimates to ease comparison. The resulting plots of Figure [6] in general show 
that the model's estimated spatial dependence structure is consistent with its empirical 
counterparts once residual variability in annual maxima given respective GEV parameters 
has been eliminated, ie. when A; = 0. Without altering the residual variability, ie. taking 
A; = 1, we see from Figure[6]that the empirical correlations between annual maxima are sig- 
nificantly greater than expected under the model, indicating that the original conditional 
independence assumption, introduced in ^2.21 is violated. Thus use of the information 
sandwich correction to estimate standard errors associated with parameters in the data 
layer is vital for giving adequate estimates of parameter uncertainty. 



5.5 Spatial prediction 

Finally Figure [7] shows a map of the 100-year return level estimate, together with 95% 
confidence bound widths, for the region of the UK under study. The map is obtained from 
estimates of the 0.99 quantile of the GEV distribution for each location in the region. The 
multivariate normal distribution from which to simulate GEV scale and shape parameters, 
and consequently represent their uncertainty accurately, is given by arguments in ^2.41 
and uncertainty in the kriging estimate for the GEV location parameter is achieved by 
the method described in ^4.31 The location, scale and shape parameter samples can then 
be combined and to give a return level sample and then variability in the samples used to 
accurately quantify uncertainty in the return level map. 

One of the most prominent features of Figure [7] is its resemblance to a relief map of the 
region under study. This is a consequence of elevation being the most influential covariate 
included in the model, which can be seen from its corresponding estimates given in Table 
[TJ A further way in which the model's performance can be assessed is by crossvalidation; 
that is predicting annual maxima at sites with data but deliberately omitted from model 
estimation. Quantile plots similar to those shown in Figure [5] can then be used to assess 
fit. In general these display similar features to those of Figure [5l and as a result are not 
shown, but offer further support for the fit of the present model. Consequently the return 
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Figure 6: Plots of binned empirical against model correlation estimates for fc = (•) and 
k = l{.). 

level map is deemed to provide a plausible representation of point-level behaviour of the 
100-year return level for annual maxima of daily rainfall accumulations. 

6 Discussion 

In this paper we have provided a method for interpolating extreme rainfall at fine scale 
based on a coherent way of spatially pooling related though inherently different data. 
Point-level estimates of extreme rainfall can then be produced for an entire spatial region, 
which has been achieved here using rain gauge measurements at only a few locations. 
This estimation would otherwise not be possible if a marginal approach, in which GEVs 
are fitted independently at different locations, had been used. Furthermore this method 
offers the potential for estimates of areal rainfall, such as extreme rainfall accumulations 
for a river catchment area, to be obtained. While we have used measurements from only a 
few rain gauges, the model is equally applicable if measurements from considerably more 
gauges were used. 

This work has also shown that the MCEM algorithm can be used reliably to provide 
estimates of parameters in latent Gaussian spatial models for extremes, and introduced a 
simple diagnostic tool that allows model-based estimates of spatial dependence between 
annual maxima to be compared with empirical counterparts for the model formulation 
adopted here. Furthermore we have been able to overcome potential misspecification in the 
model, in particular violation of the conditional independence assumption, and still give 
adequate estimates of parameter uncertainty by introducing a variant of the information 
sandwich estimator applicable to the MCEM algorithm. 
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